Manual merge of PRs #20394–#20397 (slice_copy + permute_copy)#20550
Conversation
Pull Request resolved: pytorch#20394 Adds `aten.slice_copy.Tensor` to the WebGPU delegate as a gather: each output element is mapped back to its source input element along the sliced dim via `start + coord * step`. Composition (single compute dispatch): - `runtime/ops/slice/Slice.cpp` — reads `args = [self, dim, start, end, step, out]` via `read_scalar` (static `Int`/`Null`-sentinel default; throws on dynamic `SymInt`); normalizes negative `dim`/`start`, clamps `start` to `[0, in_size]`; builds two `TensorMeta` UBOs + a `SliceParams{dim, start, step}` uniform; guards fp32; dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group. - `runtime/ops/slice/slice.wgsl` — delinearizes the output index over the contiguous output strides, maps the sliced-dim coordinate back to the input (`start + coord*step`), relinearizes over the input strides. ghstack-source-id: 397026527 @exported-using-ghexport Differential Revision: [D108793168](https://our.internmc.facebook.com/intern/diff/D108793168/)
…work) Pull Request resolved: pytorch#20395 Registers `aten.slice_copy.Tensor` in the `cases.py` op-test framework: a `_slice_suite` of 4 configs (leading-dim slice `[:,1:5]`, last-dim slice `[...,1:3]`, step-2 `[:,0:8:2]`, negative-end `[:,1:-1]`) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/slice/test_slice.py` (`SliceModule` + `CONFIGS` + export-delegation/eager smoke test) and the `aten.slice_copy.Tensor` partitioner-allowlist entry in `tester.py`. ghstack-source-id: 397026537 @exported-using-ghexport Differential Revision: [D108793151](https://our.internmc.facebook.com/intern/diff/D108793151/)
…ermute_copy.default) Pull Request resolved: pytorch#20396 Adds `aten.permute_copy.default` (a coordinate-reorder gather) to the WebGPU delegate, and the `IntList` graph value type it needs to read its `dims` argument. Composition: - `runtime/WebGPUGraph.{h,cpp}` — adds `ValueType::IntList` backed by `std::vector<std::vector<int64_t>> int_lists_` + `get_int_list(int)`; `build()` deserializes `vkgraph::GraphTypes::IntList` via `value_as_IntList()->items()` (int64, matching the FlatBuffer `[long]`); mirrors the existing scalar value plumbing. - `runtime/ops/permute/Permute.cpp` — reads the permutation via `get_int_list`, normalizes negative dims, validates it is a permutation of `[0, ndim)`, builds two `TensorMeta` UBOs + a `PermuteParams{perm: vec4<u32>}` uniform, guards fp32 + rank≤4, dispatches over `compute_1d_workgroup_count(out.numel)` with `override wg_size`; releases all uniforms after the bind group. - `runtime/ops/permute/permute.wgsl` — delinearizes the output index over the contiguous output strides, reads `input` at `in.strides[perm[d]]` per dim (mirrors Vulkan `permute_buffer.glsl`). - Registers both `aten.permute_copy.default` and `aten.permute.default` to the same handler. ghstack-source-id: 397026548 @exported-using-ghexport Differential Revision: [D108793162](https://our.internmc.facebook.com/intern/diff/D108793162/)
…mework) Pull Request resolved: pytorch#20397 Registers `aten.permute_copy.default` in the `cases.py` op-test framework: a `_permute_suite` of 4 configs (3D rotation, 4D middle-dim transpose, 2D transpose, full 4D shuffle) that `generate_op_tests` exports via `VulkanPartitioner` and compares to a torch golden on Dawn. Also adds `test/ops/permute/test_permute.py` (`PermuteModule` + `CONFIGS` + `_op_delegated` smoke test) and the `aten.permute_copy.default` partitioner-allowlist entry in `tester.py`. ghstack-source-id: 397026550 @exported-using-ghexport Differential Revision: [D108793156](https://our.internmc.facebook.com/intern/diff/D108793156/)
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/20550
Note: Links to docs will display an error until the docs builds have been completed. ⏳ No Failures, 5 PendingAs of commit dde991a with merge base b919db7 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
psiddh
left a comment
There was a problem hiding this comment.
Approving this to unblock the diff train
Summary
Manual merge of four WebGPU-delegate op PRs that landed internally but could not auto-merge
to
main. These are stacked ghstack PRs — when the lower PRs in the stack merged, their headbranches were deleted and these four PRs' base branches were orphaned, so the orig-PR
proposer failed with
422 base invalid. This PR re-lands the same four commits (identicalcontent to the originals, flat test layout) as a clean stack on top of current
main:slice_copyop(
aten.slice_copy.Tensor)slice_copyop test suite(cases.py op-test framework)
permute_copy+IntListgraph support (
aten.permute_copy.default)permute_copyop test suite(cases.py op-test framework)
Test plan
Each op ships with its
cases.pyop-test suite (exported viaVulkanPartitioner, comparedto a torch golden on Dawn) plus an export-delegation smoke test, exercised by the WebGPU
op-test CI (
etvk-*). Verified internally; content is identical to the original four PRs.@diff-train-skip-merge